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A PARALLEL VARIABLE METRIC OPTIMIZATION ALGORITHM 


By Terry A. Straeter 
Langley Research Center 

SUMMARY 

An algorithm is introduced which is designed to exploit the parallel computing or 
vector streaming (pipeline) capabilities of computers with such advanced features. If p 
is the degree of parallelism, then one cycle of the parallel variable metric algorithm is 
defined as follows: first, the function and its gradient are computed in parallel at p dif- 

ferent values of the independent variable; then the metric is modified by p lank-one 
corrections; and finally, a single univariant minimization is carried out in the Newton- like 
direction. Several properties of this algorithm are established in the paper. In addition, 
the convergence of the iterates to the solution is proved for a quadratic functional on a 
real separable Hilbert space; in fact, for a finite -dimensional space the convergence is in 
one cycle when p equals the dimension of the space. Results of numerical experiments 
indicate that the new algorithm will exploit parallel or pipeline computing capabilities to 
effect faster convergence than serial techniques currently in use. In fact, the experi- 
ments indicate that even when the computations are done serially, the new algorithm is 
very competitive with the widely used Davidon-Fletcher- Powell technique. 

INTRODUCTION 

In order to solve optimization problems efficiently by using computers with parallel 
computing or vector streaming (pipeline) capabilities, it is necessary either to develop 
new algorithms or to induce a high degree of parallelism into existing techniques. It may 
be that for these new computers many currently fashionable and serially efficient algo- 
rithms will be replaced by new parallel techniques. For simplicity, hereafter the term 
parallel is used to mean both parallel and vector streaming operations. 

For a number of years it has been very popular and quite efficient to solve optimi- 
zation problems on serial computers by the conjugate gradient (ref. 1) or variable metric 
(refs. 2 and 3) methods. But, as Miranker points out in his survey article (ref. 4) dis- 
cussing parallelism in numerical analysis, these methods are inherently serial as each 
new search direction requires the result of the previous search. Hence, most researchers 
have concentrated their efforts on developing parallel univariant minimization techniques 
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(see Avriel and Wilde (ref. 5) and Karp and Miranker (ref. 6)) and on modifying Powell’s 
method (ref. 7) with Zangwill’s modification (ref. 8). (See Chazan and Miranker (ref. 9).) 

The purpose of this paper is to introduce the parallel variable metric (PVM) algo- 
rithm, a new technique with a high degree of parallelism for use on the new computers. 
The basis of the algorithm is an observation by Powell (ref. 10, p. 93) on rank-one vari- 
able metric algorithms (refs. 11, 12, and 13) concerning the form of the rank-one update 
and some work by the author on an early version of the algorithm (ref. 14). If p is the 
degree of parallelism, then one cycle of the parallel variable metric algorithm is defined 
as follows: first, the function and its gradient are computed in parallel at p distinct 
values of the independent variable; then the metric (v( n ) as defined later) is modified 
by p rank- one corrections; and finally, a single uni variant minimization is carried 
out in the Newton-like direction. Herein the following results for the PVM algorithm 
are established: (1) If the function to be minimized is quadratic, defined on a finite- 

dimensional space, then the iterates converge to the location of the minimum in one 
cycle. (2) For strictly convex functions on a finite- dimensional space, convergence of 
the iterates to the location of the minimum follows if the metrics are uniformly bounded. 

(3) Convergence of the iterates to the minimum is established for the problem of mini- 
mizing a strongly positive quadratic functional on a real separable Hilbert space. 

(4) Finally, the results of numerical experiments involving the application of the parallel 
variable metric algorithm to sample problems are included. These results are com- 
pared with other investigators’ results obtained by using a sequential or serial technique 
(Davidon- Fletcher- Powell) on the same problems. These results indicate that the new 
algorithm will exploit parallel computing capabilities to effect faster convergence (in 
terms of total time required to solve the problem) than serial techniques currently in use. 
In fact, those results indicate that for some problems - even when the PVM computations 
are done in a serial fashion - the PVM method is competitive, in terms of the number of 
function and gradient evaluations required to locate the minimum, with the Davidon- 
Fletcher- Powell (DFP) method, a widely used serial method. 

SYMBOLS 

A self-adjoint, positive, linear operator from H into H 

A * inverse of A 

B^ n ),v( n ),V^ n ) sequence of linear operators 
b fixed element of H 
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C,v 

g 

*i 

H 


I 

u,Mi 

j 

n 


P 

R 


r 


n 


s 


n 


u,z 

x 


X,X,Xj 


linear operator 
gradient of J 
gradient J at x. 
real Hilbert space 
identity operator 
integers 

functional defined on H 

real numbers 

iteration number 

degree of parallelism 

set of all real numbers 

nth residual vector, element of H 

direction of nth step, element of H 

elements of H 

element of H at which J is minimized 
nth iterate, element of H 
element of H 

element of H defined by g i - g(x n ) 




scalars 



step size, a real number 


Of 


n 


INI 


basis element of H 

scalars defined by equations (2) 

inner product on H 

1/2 

norm on H defined as (■,•) ' 


Abbreviations: 

DFP Davidon-Fletcher-Powell 

dim H dimension of space H 

PVM parallel variable metric algorithm 

SPQF strongly positive quadratic functional 

PARALLEL VARIABLE METRIC (PVM) ALGORITHM 

Consider the problem of finding the element x e H, a real separable Hilbert space 
with inner product (*,•) which minimizes a differentiable function J:H — R with gradi- 
ent g, Let x 1 e H be the initial estimate of the location of the minimum of J and 
let vW be a self-adjoint linear operator from H to H. Moreover, let M Q ^ m Q > 0 
be such that for all x e H, m 0 (x,x) s ^x,V^°^x) ^ M 0 (x,x); that is, V^ 0 ^ is strongly 
positive. If J happens to be a quadratic functional ^i.e., J(x) = J° + (b,x) + Jj(x,Ax) 

with b e H, J° e R and A strongly positive) then V^ can be interpreted as an initial 

estimate of A" Further, let p be a positive integer which will be called the degree 
of parallelism. If H is finite dimensional, it is advantageous to let p = dim H. Let 
2 ={ a j} £ H represent a Schauder basis for H (ref. 15). If dim H = p < then let 
a p+l = °1 an( * °p+2 = °2 anc * so on * That is, for a finite -dimensional space the o\j 
vectors represent a recycling through the p basis vectors. The quantities j(x^) 
and g(x^) are computed and the first (n = 1) and successive iterations are obtained as 
follows: 
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Step 1. The function and its gradient are computed at p distinct values of the 
independent variable. * For i » 1, 2, . . p let j = (n - l)p + i and 


x^x +ffj 


and compute gj = g( x j) an( * ^( x j) i n a P ara ll e l fashion at the p values for Xj. If 
(gj,g.j) = 0 then Xj satisfies the necessary condition for a minimum and computation is 
stopped provided J^Xj) s {All other computed values of j}. 

Step 2. The metric is modified by p rank-one corrections. Compute the vectors 


y j = g j - g ( x ”) 

for j ss (n - l)p + i with i = 1, 2, . . p. Next the residual vectors must be computed. 
(The reason for this terminology is explained later.) 

Define 


and 


(n 1) 

(n-l)p+l - ^ y (n-l)p+l ” a (n-l)pH-l 


r . = V^-^y- - a- 

J J 


V ( r (n-l)p+k’ y j) 

k=1 ( r (n- l)p+k’ y (n- l)ptk) " ** 


(1) 


where j = (n - l)p + i and i = 2, 3, , . p. If the denominator in equation (1) is zero 
for any term, that term is deleted from the summation. 

Compute the p scalars 



j=(n- l)p+l 

where B^:H — H for j = (n - l)p + i and i a l ( 2 is defined such that for 

all x e H 


(4) 


B®x = (x,r.)r. 
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Step 3. A single univariant minimization is carried out in the Newton- like direc- 
tion of search. Let 

S n = -V< n >g(x") (5) 

where s n is called the direction of search. The step size of n (a scalar) is computed 
and the next iterate is defined by x n+1 = x n + « n s n and then j(x n+1 ) and g(x n+1 ) are 
computed. The step size herein is chosen by means of a one- dimensional minimization. 
This could be done by a number of techniques (refs. 2, 5, and 16). If ||g(x n+ ^)|| is suf- 
ficiently small, then stop; otherwise, let n = n + 1 and go to step 1. 

Each pass through the algorithm (steps 1 to 3) is called a cycle. Notice that 
steps 1 and 2 each entail a high degree of parallelism. Step 3 does not directly involve 
any parallel computation of J or its gradient. However, in the one-dimensional mini- 
mization of step 3, any parallel or pipeline structure of the computer can be exploited 
(ref. 5), For most optimization problems the time required for the evaluation of the func- 
tion and gradient is much greater than that required for all the other calculations of the 
algorithm. It is for this reason that optimization algorithms are judged by the total num- 
ber of function and gradient evaluations required to solve the problem. The parallel 
computations of J and g called for in step 1 are the main time-saving element of the 
algorithm, not the other parallel computations of y^x^ and, to a lesser degree, rj. 

Figure 1 gives a two-dimensional illustration of the progress of the algorithm with 
p = 2. The figure depicts the level curves of the function J, x^ the initial estimate of 
the location of the minimum, and g(x 1 ) the gradient of J at x 1 . Also shown are x± 
and x 2 , as defined in step 1 by x 1 = x 1 + a 1 and x 2 - x 1 + a 2 . At xj and x 2 the 

vectors of g(x^) and g(x 2 ) are shown. With this information, r^, r 2 , and \K*) can 
be defined. This then defines the search direction s 1 - -V'( 1 ^g(x 1 ). Finally, the 
point x 2 (found by minimizing + asj with respect to a) is shown. 

PROPERTIES OF THE OPERATORS 

For any real separable Hilbert space H and differentiable functional J, the fol- 
lowing theorems can be established: 

Theorem 1: as defined by equation (4), is a self-adjoint positive operator 

(i.e., for all xeH (x,B(j)x) > o). 

Proof: Clearly B^) is linear by the linearity of the inner product. If x e H, 

then (x,B(j)x) - (x,(x,r^r^ = ( x > r j) 2 = 0. 
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If x,y e H, then 

(x,B<i)y) = (x,(y > r j) r j) = (y> r j)( x > r j) = (y< x ’ r j) r j) = ( y > B(])x ) 

Theorem 2: is self adjoint for all n. 

Proof: By definition, is self adjoint; by theorem 1, the operators are 

self adjoint; and by equation (3), is a finite linear combination of and the 

operators. 

To facilitate the proof of later results, define an additional p - 1 linear operators 
from H to H for each cycle n as follows: 

y( n "l) - y( n ~l) 

0 
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and 


(n-l)p+i 


V ( n -D ^ v^-D + £ Tj B® (i = i,2, . . „p) 

j=(n-l)p+l 


( 6 ) 


Clearly then 

y(n) = v (n-l) 

P 

where r- and B^ are as in equations (2) and (4). Also, it is clear that i s 

given by 1 


y(n-i) _ y (n-l) + B 

i i-l (n-l)p+i 

and equation (1) can be rewritten as 

r. = V {n :^y. - a. 

J i- 1 3 3 


((n-l)p+i) 


( 7 ) 


( 8 ) 


where j - (n - l)p + i and i = 1, 2, . . ,,p. Because of the definitions of V^ n \ y-, 
and Oj , theorem 3 follows . 

Theorem 3: If Tj * 0 or r = 0, then V^’y. = a for j = (n - l)p + i for 
each n, with i = 1, 2, . . p. J 


Proof: Let i be a positive integer between 1 and p and n 1 
integer. Then, if rj = 0, tj = 0 by definition of Tj so that vf n-1) 
theorem is true by equation (8). Otherwise, if Tj # 0 

V i nl)y j = 


(by eq. (7)) 



(by eq. (4)) 

= v£i\ - - °j) 


(by eq. (8)) 

= 0 j 




the 
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At this point it is advantageous to define a strongly positive quadratic functional 
(SPQF). A functional jof the form 

J(x) = J° + (b,x) + i(x,Ax) 

is an SPQF when J° e R,b e H, and A is a strongly positive, self-adjoint, linear oper- 
ator from H into H (i.e., there exist m, M > 0 such that m(x,x) = (x,Ax) = M(x,x)j. 
It is well known that the location of the minimum of J, denoted by x, exists and is the 
equal to -A - ^b. Also, the gradient of an SPQF is g(x) = Ax + b (ref. 17), Another 
useful result for an SPQF is that if x «= x + o and y = g(x) - g(x), then 


A -1 y = o 


( 10 ) 


Using equation (10) in equation (8) gives r. - vj_^ Vj "A y^ where j (n l)p + i. 
This form of r- clearly shows the reason for calling rj the residual vector; that is, 
jrror^at y- in the approximation of A"^ by With these funda- 


rj is the error 


mental relations for an SPQF theorem 4 can now be established. 

Theorem 4: If J is an SPQF and u e H such that A u = v| ^ ^u, and C.H -• H 
is a linear operator such that C - vj 11 ' 1 ) + jiB™ for some real 4, for 
j = (n - l)p + i, then A _1 u = Cu. 


Proof: Since A 


-1 


y. = cr. 
y 3 3 


and Xj = x + Oj 


then 


Since 


then 


r i = (vi-i' 1 ’ - A '% 

(v{ n J 1} - A _1 )u = 0 

M = (( v l-i 1} - A ‘V) 
= - A_1 ) u ) 


- A _1 y- 


= (vy°) = ° 
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By hypothesis 

(c - A~^)u = mB^u 

= ^( r i’ U ) r j 

= * ' 0 * r j 
= 0 

Since vj n ~^ = V i_i^ + t j b ^ the following corollaries can be obtained. 

Corollary 1: If vK -1 ^ = A -1 u for some u e H and J is an SPQF, then 

vf^u = A^u. 

Corollary 2 (fundamental property of V^ n_1 ^): If J is an SPQF and r q * 0 for 
all q = (n - l)p + i, then v| n-1 V k = A* X y k = a k for all k g (n - l)p + i. 

Proof: For any fixed but arbitrary positive integer n, recall from theorem 3 that 

Tr(n-l) 

V 1 y (n-l)p+l “ ff (n-l)p+l 
and 

v i-i X) yk = °k (k = (n - I)p + i - 1) 

Assume V k - CT k for all k = (n - l)p + 1-1, Since 7 " k ^ 0 , then for 

k = (n - l)p + i, vj n ' 1 V k = o k by theorem 3. By corollary 1 of theorem 4 and the 

inductive hypothesis, = a k for all k S (n - l)p + i. Since n was fixed but 

arbitrary, the corollary is established. Corollary 2 is most useful in later convergence 
arguments and, hence, it is called the fundamental property of v| n "^, 

Although at this point several convergence results could be established, first other 
properties of the sequence of operators V> n) will be derived. For the remainder of 
this section it is assumed that J is an SPQF . In that case Goldfarb (ref. 18) has 
observed that Tj for j = (n - l)p + i can be expressed as 
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Hence equation (7) can be written as 


y(n-l) _ y( n 1) 
v i l-l 



(ID 


or if r. = 0, = V^ 1} . 

The preceding observation yields the following theorem which is proved by induc- 
tion using the Cauchy -Schwarz inequality and equation (11). A similar theorem and 
proof for the serial rank -one algorithm is given in reference 19. 

Theorem 5: If V (0) S A' 1 (v (0) S A' 1 ) then v| n) i A' 1 (v[ n) S A' 1 ) for all n 
and 1 = 1,2, (v^SA' 1 means (x,V^x) 2 (x,A _1 x) for all x e H.) 

Again, as in reference 19, by using equation (11), the Cauchy -Schwarz inequality, and 
theorem 1 a condition can be established under which the ^ and hence the V 
operators form a monotone sequence of self-adjoint bounded operators. 

Theorem 6: If V^ 0) ^ A -1 (v (0) £ A" 1 ) then A" 1 ^ ^ V^\ = 

v ( n ) - ~ . . . = for all n and i = • • . = 

v[ n) ^ A -1 ). 

Corollary: If ^ A -1 or i A -1 then the operators form a mono- 

tone sequence of strongly positive, self-adjoint linear operators bounded by 
and A' 1 . Moreover, there exists a strongly positive self-adjoint operator V 

such that lim||v( n )x - Vx|| = 0 for all x e H. 
n—°° 

Proof: By the hypothesis on V^°\ theorem 6, and theorem 2, it is well known 

(ref. 20, p. 189) that the conclusion to the theorem follows. 

If dim H is finite and p = dim H, theorem 7 can be proved. 

Theorem 7: If 0 for i = 1, 2, . . ., p and dim H = p, then = A' 1 . 

Proof: Since A is strongly positive, for each xcH there exists z e H such 
that x = Az; hence, z = A _1 x, and since the a vectors form a basis, there exist 
scalars /3 i such that 
P 

z = £ 
i=l 
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Thus 


x = A 




Now by the 

fundamental property of 

v (0) 

p 


y(°V = a. 

p y i i 


and 

y(0) y(l) 

p 


hence 

V (1) x = ^ ^ 1 \ i - £ 

i = z 


i=l i=l 


and also 

A~*x = z 



Hence 

A^x = V^x 
for each x e H. 

If dim H is not finite then it can be shown that if t. * 0 for all j, — A" 1 

pointwise as in the following theorem. 

Theorem 8: If =£ 0 for j = 1, 2, . . . and the operators are uniformly 

bounded, then lim ||v^x - A _1 x|| = 0. 
n— « 

Proof: Let x e H; then there exists zeH such that x = Az (i.e., z = A _1 x) 

and z - ^ since the a vectors form a basis for H. 

i=l 


Now, since A is bounded, 


x = A 



i=l 


OO 



i=l 
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Hence 


oo m) oo 

v (n) x = v (n) ^ PjYj = 2 PjV (n Vj + v (n) ^ ^yj 

j=i j=l j=np+l 

np «> 

= Z^v v(n) I Vi 

3=1 j=np+l 

Therefore 



OO 


V (n) - A* 1 


<30 

Hv^x - A _1 x|| = 

(v (n) - A' 1 ) ^ PjYj 

j=np+l 

< 


1 ¥i 





j=np+l 


Because the operators are uniformly bounded, ||v^ - A ^|| is bounded, and 

oO co 

since ^ ^y. converges, then ^ /3 i y i - 0 as n - therefore, the right-hand 

i=l i=np+l 

side — 0 as n — 

Corollary: If = A -1 or A -1 and r. * 0 for all j, then 

lim ||v^x - A _1 x|| = 0. 
n-*°° 


CONVERGENCE RESULTS 

By utilizing the previous results, the following convergence theorems can now be 
established. 

Theorem 9: If J is an SPQF, * 0 for i = 1, 2, . . ., p where dim H = p < °°, 
then the algorithm converges to the location of the minimum of J in one cycle. 

Proof: At step 3 s^ = -V^g(x*) and by theorem 7 = A -1 so 

Sj - -A' 1 g(x 1 ) = -A' 1 (Ax 1 + b) = -x 1 - A _1 b 

Hence, x 2 = x* + ^-.(-x 1 - A“*b), and a* = 1 is clearly the proper choice of a; hence, 
x 2 = -A _1 b = x. 
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Theorem 10: If J is an SPQF, the operators are uniformly strongly 

positive - that is, there exist a,i 3 > 0 such that al ^ % 01 - and H is 

an infinite -dimensional Hilbert space, then the algorithm converges to the loca- 
tion of the minimum J . 

Proof: It is sufficient to show that g(x n ) — 0 as n — -*>. Since the step sizes 
are chosen as a result of a one -dimensional minimization, it is well known (e.g., ref. 19) 
that necessarily 



Hence, using the fact that A is stongly positive and the V' n ^ operators are uniformly 
strongly positive gives 


( s n »g( xn )) ^ ( s n >g( xn )) ^ (v< n )g(x n ),g(x n )) 2 ^ o? 2 Hg(x n 




( s n ,As n) M||s n || 2 M ||vWg(x n )|| 2 " M ||v( n >g(x n )|| 2 “0 2 ||g(x n )|| 2 M0 


=4i^)ii 2 S o 


2 

Therefore ||g(x n )|| -0, so g(x n ) -*• 0 and x 11 — -A -1 b. 

Corollary: If J is an SPQF and £ A -1 or i A" 1 , then the algorithm 

converges to the location of the minimum. 

Proof: By theorem 5 and the corollary to theorem 6 the operators are uni- 

formly strongly positive. 

Now consider strictly convex twice -differentiable functions on a finite -dimensional 
Hilbert space. 

Theorem 11: Suppose J is strictly convex with bounded second partial deriva- 
tives. That is, if 

(a) m(x,x) g (x,J"(x)x) g M(x,x) for all x e H where °° > M ^ m > 0 and J" 

denotes the second derivative of J (i.e., Hessian of J) 

(b) J"(x) is nondecreasing along any path of nonincreasing function values 

(c) For all x v x 2 e H, and a e (0,1), j(<* Xl + (1 - a)x 2 ) < arj(x 1 ) + (1 - a)j(x 2 ). 

Therefore, the iterates generated by the PVM algorithm converge to the global 
minimum of J if the v( n ) operators are uniformly positive definite (i.e., 
there exist positive constants of ,/8 such that al s s 01 for n). 

Proof: This theorem follows from theorem 1 in reference 18. 

For a serial version of the rank-one algorithm with step size chosen by a one- 
dimensional minimization, Goldfarb (ref. 18) has shown that for a strictly convex function 
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on R n the V operators are uniformly positive definite whenever = J’^x 1 ) or 

jKx 1 ) ^ V^. However, this result has not been extended to the parallel algorithm 
described herein. 

EXAMPLE PROBLEMS AND RESULTS 

To illustrate the performance of the parallel variable metric minimization algo- 
rithm, numerical experiments were conducted on several standard example problems. 
Although the experiments were conducted on a serial computer, the results of the compu- 
tations were used as if they had been done in a parallel fashion. Five sample functions 
are used for the numerical experiments. 

A simple quadratic function of three variables 
fl(x,y,z) = x 2 - 2xy + 2y 2 + 5z 2 

was the first function with x 1 * (1,1,1). For this example p was chosen to be 3 and, 
as theorem 9 indicated, convergence was achieved in one cycle. 

The second function was the well -traveled Rosenb rock's parabolic valley function 

2 

f 2 (x,y) = (x 2 - y) + 0.01(x - l) 2 

This is a particularly difficult function to minimize because of the long parabolic valley 
y = x 2 along which the minimization must travel. The traditional starting point is 
(-1.2, 1.0) and the minimum is located at (1,1). 

The third function, known as Powell’s function, is 

f 3( x l’ x 2> x 3’ x 4) = ( X 1 - 10x 2) 2 + 5 ( x 3 ' x 4) 2 + ( x 2 ' ^ + 10 ( x l “ x 4) 4 

with starting values at (3, -1,0,1). This function is particularly difficult for a variable 
metric algorithm to minimize because at the minimum the Hessian is singular. 

The fourth test function is called the 4-D banana or Wood's function and is defined 
by 


2 \2 

£ 4( x 1> x 2> x 3’ x 4) = 100 ( x l 2 - x 2) + ( £ - x l) 2 + 90 ( x 3 2 ' x 4j + ( £ ’ x 3) 2 
+ 10 -l[( x 2 - i) 2 + ( x 4 - £ ) 2 ] + 19 - 8 ( x 2 - l)( x 4 - 1) 


15 



The traditional starting estimate is at (-3,-1, -3,-1). 

2 2 

mize because the quadratics x^ - Xg and Xg - x 4 
shaped. 


This function is difficult to mini- 
make the level surfaces banana 


The fifth function is the helical valley function defined by 


where 


f 5 ( x l> x 2> x 3) = 100 


(x 3 - ioe ) 2 + 


2 + x 2 2 



r 


tan 


-1 


2i t9 - / 


^2 

xi 


+ tan 


- 1^2 

X 1 


( X 1 > °) 
( X 1 < °) 


The usual starting estimate is at (-1,0,0). 

On a present-day serial computer, numerical experiments were conducted in which 
the PVM algorithm was applied to the five standard test functions. For these experi- 
ments the convergence criterion was the absolute value of the largest component of the 

-7 Q 

gradient less than € = 10 . The basis vectors were defined as cr a 10 ee^, where e i 

is the ith elementary vector. The one -dimensional minimization required in step 3 of 

the PVM algorithm was carried out by Davidon’s cubic interpolation method with initial 

estimate of the step size X = min(2.,-2(j(x i )- where J min is the 

estimated minimum value of J. The degree of parallelism p was chosen to be the 

number of variables in the test function and » I. 

Table 1 gives the results of these numerical experiments. For each test function 
the number of cycles required to achieve convergence is listed. Also listed is the total 
number of function and gradient evaluations required for convergence on the serial com- 
puter. The third column of results in table 1 presents the situation as if the computa- 
tions had been done on an advanced computer with stream or parallel computing capa- 
bilities. Then, the computation of the p gradients in step 1 would have been done by 
utilizing these capabilities. Therefore, ignoring overhead costs, these p gradients 
would take essentially the same time as the computation of one gradient. It is for this 
reason that in table 1 the number of evaluations for the parallel case is the same as for 
the serial case minus (p - 1) times the number of cycles. Columns four and five give 
measures of the accuracy of the minimization. 
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TABLE 1.- RESULTS OF PVM METHOD APPLIED TO 
* FIVE SAMPLE PROBLEMS 


Function 

Number 

of 

cycles 

Total number 
of function 
and gradient 
evaluations 
(serial) 

Total number 
of function 
and gradient 
evaluations 
(parallel) 

Function 
value at 
convergence 

Largest 
componentwise 
error in 
solution at 
convergence 

Quadratic 

1 

7 

5 

1.5 X 10' 21 

5 x 10“ U 

Rosenbrock 

13 

58 

45 

7.7 X 10" 16 

5 x 10“ 7 

Powell 

19 

118 

61 

4.6 X 10" 15 

2 X 10' 4 

4-D banana 

30 

185 

95 

9.7 X 10" 18 

2 x 10" 9 

Helical valley 

14 

75 

47 

1.2 X 10" 20 

6 X 10“ 12 


Table 2 shows the progress to the minimum of the PVM for the first three test 
functions. Also recorded in table 2 is the number of function and gradient evaluations 
taken with each cycle. Notice that the convergence to the minimum occurs in one cycle 
for the quadratic as predicted by theorem 9. Also notice the apparently supei linear rate 
of convergence for Rosenbrock's function. It is conjectured that as oj is made small 
y( n ) - This conjecture is based on the fundamental property of V^ n '. The 

slowing convergence on Powell’s function is due to the singularity of J at the minimum. 

TABLE 2.- PROGRESS PER CYCLE OF PVM ON THREE EXAMPLE PROBLEMS 


Cycle 

Quadratic 

Rosenbrock's function 

Powell's function 

Function 

value 

Number of 
function 
evaluations 

Function 

value 

Number of 
function 
evaluations 

Function 

value 

Number of 
function 
evaluations 

0 

6.0 

1 

4.8 x 10' 1 

1 

8.7 x 10 2 

1 

1 

1.5 x 10" 21 

6 

4.7 x 10' 2 

4 

2.1 x 10 1 

7 

2 



4.2 X 10‘ 2 

5 

2.1 

6 

3 



1.9 X 10' 2 

5 

2.1 x 10" 1 

7 

4 



1.6 X 10' 2 

4 

2.1 X 10" 2 

6 

5 



6.0 X 10' 3 

5 

2.1 x 10" 3 

6 

6 



3.9 X 10" 3 

4 

2.1 X 10" 4 

6 

7 



1.8 X 10' 3 

5 

2.1 X 10' 5 

6 

8 



6.4 X 10' 4 

4 

2.1 X 10" 6 

6 

9 



2.6 X 10' 4 

4 

2.1 x 10" 7 

6 

10 



2.3 X 10" 5 

5 

2.1 X 10" 8 

6 

11 



4.7 X 10" 8 

4 

2.1 X 10" 9 

6 

12 



2.8 X 10" 11 

4 

2.2 X 10" 10 

6 

13 



7.7 X 10" 16 

4 

2.4 X 10" 11 

6 

14 





3.1 X 10' 12 

6 

15 





4.6 X 10" 13 

6 

16 





8.6 X 10" 14 

6 

17 





1.9 X 10" 14 

6 

18 





1.8 X 10“ 14 

6 

19 





4.6 X 10' 15 

7 

Total 


7 


58 


118 
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Table 3 illustrates a comparison of the performance of the DFP algorithm with the 
PVM algorithm on the two example functions, f 2 and fg. The DFP method was chosen 
as the standard of comparison because of its wide use and generally it has compared 
favorably with other techniques. The key factor in the performance of any minimization 
technique is the number of function and gradient evaluations required to converge, as the 
computation required for evaluating the function is usually much greater than that involved 
in the algorithm. Table 3 lists the number of function evaluations required by the DFP 
method to locate the minimum of f 2 and fg starting from the same initial conditions. 
These results have been obtained by other investigators (refs. 21 and 22). Also listed in 
table 3 are results on these problems reported by Jacobson and Oksman (ref. 23) for a 
DFP subroutine. Finally, in table 3 the performance of the parallel variable metric 
method is given for two cases. In the first case, the method is used on a serial com- 
puter, hence the operations in step 1 (i.e., gradient evaluations) are not done in parallel. 
For the second case, it is assumed that the operations in step 1 are carried out in par- 
allel. Hence the p gradient evaluations of step 1 will require only the time to carry 
out one evaluation. Thus the entry in table 3 for the parallel case is merely the same as 
that for the serial case minus (p - 1) times the number of cycles. 


TABLE 3.- COMPARISON OF NUMBER OF FUNCTION EVALUATIONS REQUIRED TO 
ACHIEVE CONVERGENCE FOR PVM WITH RESULTS OF OTHER RESEARCHERS 


Method 

Number of function evaluations required to 
achieve convergence for — 

Rosenbrock 1 s function 

Powell’s function 

Davidon-Fletcher-Powell (Greenstadt) 

192 

134 to f = 10" 11 

DFP (Straeter) 

101 

Not reported 

DFP (Jacobson) 

165 

80 to f = 10* 8 

PVM in serial 

58 

63 to f = 10" 8 
81 to f = 10" U 
118 to f = 10- 15 

PVM in parallel 

h" — — 

44 

61 
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CONCLUDING REMARKS 


An algorithm designed to exploit the parallel computing or vector streaming (pipe- 
line) capabilities of computers with these special features has been presented. Several 
properties of this algorithm were established herein. In addition, the convergence of the 
iterates to the solution has been proved for a quadratic functional on a real separable 
Hilbert space; in fact, for a finite -dimensional space the convergence is achieved in one 
cycle, when the degree of parallelism equals the number of independent variables. 

Results of numerical experiments indicate that the new algorithm will exploit the parallel 
or pipeline computing capabilities of the new computers to effect faster convergence than 
serial techniques currently in use. In fact, the experiments indicated that even when the 
computations are done serially, the new algorithm is very competitive with the widely 
used Davidon-Fletcher-Powell technique. 

Langley Research Center, 

National Aeronautics and Space Administration, 

Hampton, Va., September 6, 1973. 
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